{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7a765629",
   "metadata": {},
   "source": [
    "# Semantic Search using the Inference API with the Hugging Face Inference Endpoints Service\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/integrations/hugging-face/huggingface-integration-millions-of-documents-with-cohere-reranking.ipynb)\n",
    "\n",
    "\n",
    "Learn how to use the [Inference API](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-apis.html) with the Hugging Face Inference Endpoint service for semantic search."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9101eb9",
   "metadata": {},
   "source": [
    "# 🧰 Requirements\n",
    "\n",
    "For this example, you will need:\n",
    "\n",
    "- An Elastic deployment:\n",
    "   - We'll be using [Elastic serverless](https://www.elastic.co/docs/current/serverless) for this example (available with a [free trial](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook))\n",
    "\n",
    "- Elasticsearch 8.14 or above.\n",
    "   \n",
    "- A paid [Hugging Face Inference Endpoint](https://huggingface.co/docs/inference-endpoints/guides/create_endpoint) is required to use the Inference API with \n",
    "the Hugging Face Inference Endpoint service."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4cd69cc0",
   "metadata": {},
   "source": [
    "# Create Elastic Cloud deployment or serverless project\n",
    "\n",
    "If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f27dffbf",
   "metadata": {},
   "source": [
    "# Install packages and connect with Elasticsearch Client\n",
    "\n",
    "To get started, we'll need to connect to our Elastic deployment using the Python client (version 8.12.0 or above).\n",
    "Because we're using an Elastic Cloud deployment, we'll use the **Cloud ID** to identify our deployment.\n",
    "\n",
    "First we need to `pip` install the following packages:\n",
    "\n",
    "- `elasticsearch`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8c4b16bc",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install elasticsearch\n",
    "%pip install datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41ef96b3",
   "metadata": {},
   "source": [
    "Next, we need to import the modules we need. 🔐 NOTE: getpass enables us to securely prompt the user for credentials without echoing them to the terminal, or storing it in memory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "690ff9af",
   "metadata": {},
   "outputs": [],
   "source": [
    "from elasticsearch import Elasticsearch, helpers\n",
    "from getpass import getpass\n",
    "import datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23fa2b6c",
   "metadata": {},
   "source": [
    "Now we can instantiate the Python Elasticsearch client.\n",
    "\n",
    "First we prompt the user for their password and Cloud ID.\n",
    "Then we create a `client` object that instantiates an instance of the `Elasticsearch` class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "195cc597",
   "metadata": {},
   "outputs": [],
   "source": [
    "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id\n",
    "ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n",
    "\n",
    "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key\n",
    "ELASTIC_API_KEY = getpass(\"Elastic Api Key: \")\n",
    "\n",
    "# Create the client instance\n",
    "client = Elasticsearch(\n",
    "    # For local development\n",
    "    # hosts=[\"http://localhost:9200\"]\n",
    "    cloud_id=ELASTIC_CLOUD_ID,\n",
    "    api_key=ELASTIC_API_KEY,\n",
    "    request_timeout=120,\n",
    "    max_retries=10,\n",
    "    retry_on_timeout=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1115ffb",
   "metadata": {},
   "source": [
    "### Test the Client\n",
    "Before you continue, confirm that the client has connected with this test."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "cc0de5ea",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'name': 'serverless', 'cluster_name': 'd3ae40d244564c39961aa942d9d47f84', 'cluster_uuid': 'poKWeRbiS--nyD43R_NROw', 'version': {'number': '8.11.0', 'build_flavor': 'serverless', 'build_type': 'docker', 'build_hash': '00000000', 'build_date': '2023-10-31', 'build_snapshot': False, 'lucene_version': '9.7.0', 'minimum_wire_compatibility_version': '8.11.0', 'minimum_index_compatibility_version': '8.11.0'}, 'tagline': 'You Know, for Search'}\n"
     ]
    }
   ],
   "source": [
    "print(client.info())\n",
    "\n",
    "\n",
    "# define this now so we can use it later\n",
    "def pretty_search_response(response):\n",
    "    if len(response[\"hits\"][\"hits\"]) == 0:\n",
    "        print(\"Your search returned no results.\")\n",
    "    else:\n",
    "        for hit in response[\"hits\"][\"hits\"]:\n",
    "            id = hit[\"_id\"]\n",
    "            score = hit[\"_score\"]\n",
    "            text = hit[\"_source\"][\"text_field\"]\n",
    "\n",
    "            pretty_output = f\"\\nID: {id}\\nScore: {score}\\nText: {text}\"\n",
    "\n",
    "            print(pretty_output)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "659c5890",
   "metadata": {},
   "source": [
    "Refer to [the documentation](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new) to learn how to connect to a self-managed deployment.\n",
    "\n",
    "Read [this page](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new) to learn how to connect using API keys."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "840d92f0",
   "metadata": {},
   "source": [
    "<a name=\"create-the-inference-endpoint\"></a>\n",
    "## Create the inference endpoint object\n",
    "\n",
    "Let's create the inference endpoint by using the [Create inference API](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-inference-api.html).\n",
    "\n",
    "You'll need an Hugging Face API key (access token) for this that you can find in your Hugging Face account under the [Access Tokens](https://huggingface.co/settings/tokens).\n",
    "\n",
    "You will also need to have created a [Hugging Face Inference Endpoint service instance](https://huggingface.co/docs/inference-endpoints/guides/create_endpoint) and noted the `url` of your instance. For this notebook, we deployed the `multilingual-e5-small` model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "0d007737",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ObjectApiResponse({'inference_id': 'my_hf_endpoint_object', 'task_type': 'text_embedding', 'service': 'hugging_face', 'service_settings': {'url': 'https://yb0j0ol2xzvro0oc.us-east-1.aws.endpoints.huggingface.cloud', 'similarity': 'dot_product', 'dimensions': 384, 'rate_limit': {'requests_per_minute': 3000}}, 'task_settings': {}})"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "API_KEY = getpass(\"Huggingface API key:  \")\n",
    "client.inference.put(\n",
    "    inference_id=\"my_hf_endpoint_object\",\n",
    "    body={\n",
    "        \"service\": \"hugging_face\",\n",
    "        \"service_settings\": {\n",
    "            \"api_key\": API_KEY,\n",
    "            \"url\": \"<HF-URL>\",\n",
    "            \"similarity\": \"dot_product\",\n",
    "        },\n",
    "    },\n",
    "    task_type=\"text_embedding\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "67f4201d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ObjectApiResponse({'text_embedding': [{'embedding': [0.026027203, -0.011120652, -0.048804738, -0.108695105, 0.06134937, -0.003066093, 0.053232085, 0.103629395, 0.046043355, 0.0055427994, 0.036174323, 0.022110537, 0.084891565, -0.008215214, -0.017915571, 0.041923355, 0.048264034, -0.0404355, -0.02609504, -0.023076748, 0.0077286777, 0.023034474, 0.010379155, 0.06257496, 0.025658935, 0.040398516, -0.059809092, 0.032451782, 0.020798752, -0.053219322, -0.0447653, -0.033474423, 0.085040554, -0.051343303, 0.081006914, 0.026895791, -0.031822708, -0.06217641, 0.069435075, -0.055062667, -0.014967285, -0.0040517864, 0.03874908, 0.07854211, 0.017526977, 0.040629108, -0.023190023, 0.056913305, -0.06422566, -0.009403182, -0.06666503, 0.035270344, 0.004515737, 0.07347306, 0.011125566, -0.07184689, -0.08095445, -0.04214626, -0.108447045, -0.019494658, 0.06303337, 0.019757038, -0.014584281, 0.060923614, 0.06465893, 0.108431116, 0.04072316, 0.03705652, -0.06975359, -0.050562095, -0.058487326, 0.05989619, 0.008454561, -0.02706363, -0.017974045, 0.030698266, 0.046484154, -0.06212431, 0.009513307, -0.056369964, -0.052940592, -0.05834985, -0.02096531, 0.03910419, -0.054484386, 0.06231919, 0.044607673, -0.064030685, 0.067746714, -0.0291515, 0.06992093, 0.06300958, -0.07530936, -0.06167211, -0.0681666, -0.042375665, -0.05200085, 0.058336657, 0.039630838, -0.03444309, 0.030615594, -0.042388055, 0.03127304, -0.059075136, -0.05925558, 0.019864058, 0.0311022, -0.11285156, 0.02264027, -0.0676216, 0.011842404, -0.0157365, 0.06580391, 0.023665493, -0.05072435, -0.039492164, -0.06390325, -0.067074455, 0.032680944, -0.05243909, 0.06721114, -0.005195616, -0.0458316, -0.046202496, -0.07942237, -0.011754681, 0.026515028, 0.04761297, 0.08130492, 0.0118014645, 0.025956452, 0.039976373, 0.050196614, 0.052609406, 0.063223615, 0.06121741, -0.028745022, 0.0008677591, 0.038760003, -0.021240402, -0.073974326, 0.0548761, -0.047403768, 0.025582938, 0.0585596, 0.056284837, 0.08381001, -0.02149303, 0.09447917, -0.04940235, 0.018470071, -0.044996567, 0.08062048, 0.05162519, 0.053831138, -0.052980945, -0.08226773, -0.068137355, 0.028439872, 0.049932946, -0.07633764, -0.08649836, -0.07108301, 0.017650153, -0.065348, -0.038191773, 0.040068675, 0.05870959, -0.04707911, -0.04340612, -0.044621766, 0.030800574, -0.042227603, 0.0604754, 0.010891958, 0.057460006, -0.046362966, 0.046009373, 0.07293652, 0.09398854, -0.017035728, -0.010618687, -0.09326647, -0.03877647, -0.026517635, -0.047411792, -0.073266074, 0.033911563, 0.0642687, -0.02208107, 0.0040624263, -0.003194478, -0.082016475, -0.088730805, -0.084694624, -0.03364641, -0.05026475, 0.051665384, 0.058177516, 0.02759865, -0.034461632, 0.0027396793, 0.013807217, 0.040009033, 0.06346369, 0.05832441, -0.07451158, 0.028601868, -0.022494016, 0.04229324, 0.027883757, -0.0673137, -0.07119014, 0.047188714, -0.033077974, -0.028302893, -0.028704679, 0.043902606, -0.05147592, 0.045782477, 0.08077521, -0.01782404, 0.0242885, -0.0711172, -0.023565968, 0.041291755, 0.084907316, -0.101972945, -0.038989857, 0.025122978, -0.014144972, -0.010975231, -0.0357049, -0.09243826, -0.023552464, -0.08525497, -0.018912667, 0.049455214, 0.06532829, -0.031223357, -0.013451132, -0.00037671064, 0.04600707, -0.057603396, 0.08035837, -0.026429964, -0.0962299, 0.022606302, -0.0116137, 0.062264528, 0.033446472, -0.06123555, -0.09909991, -0.07459225, -0.018707436, 0.028753517, 0.06808565, 0.023965191, -0.04717076, 0.026551146, 0.019655682, -0.009233348, 0.10465723, 0.046420176, 0.03295103, 0.053024694, -0.03854051, -0.0058735567, -0.061238136, -0.048678573, -0.05362055, 0.048028357, 0.003013557, -0.06505121, -0.020536456, -0.020093206, 0.014102229, 0.10254222, -0.027084326, -0.061477777, 0.03478813, -0.00029115603, 0.053552967, 0.056773122, 0.048566766, 0.027371235, -0.015398839, 0.0511229, -0.03932426, -0.043879736, -0.03872225, -0.08171432, 0.01703992, -0.04535995, 0.03194781, 0.011413799, 0.036786903, 0.021306055, -0.06722324, 0.034231987, -0.027529748, -0.059552487, 0.050244797, 0.08905617, -0.071323626, 0.05047076, 0.003429174, 0.034673557, 0.009984501, 0.056842286, 0.0683513, 0.023990847, -0.04053898, -0.022724004, 0.026175855, 0.027319307, -0.055451974, -0.053907238, -0.05359307, -0.035025068, -0.03776361, -0.02973751, -0.037610233, -0.051089168, 0.04428633, 0.06276192, -0.03754498, -0.060270913, 0.043127347, 0.016669549, 0.024885416, -0.027190097, -0.011614101, 0.077848606, -0.007924398, -0.061833344, -0.015071012, 0.023127502, -0.07634841, -0.015780756, 0.031652045, 0.0031123296, -0.032643825, 0.05640234, -0.02685534, -0.04942714, 0.048498664, 0.00043902535, -0.043975227, 0.017389799, 0.07734344, -0.090009265, 0.019997133, 0.10055134, -0.05671741, 0.048755262, -0.02514076, -0.011394784, 0.049053214, 0.04264309, -0.06451125, -0.029034287, 0.07762039, 0.06809162, 0.059983794, 0.035379365, -0.007960272, 0.019705113, -0.02518122, -0.05767321, 0.038523413, 0.081652805, -0.032829504, -0.0023197657, -0.018218426, -0.0885769, -0.094963886, 0.057851806, -0.041729856, -0.045802936, 0.0570079, 0.047811687, 0.017810043, 0.09373594]}]})"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.inference.inference(\n",
    "    inference_id=\"my_hf_endpoint_object\", input=\"this is the raw text of my document!\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f2e48b7",
   "metadata": {},
   "source": [
    "**IMPORTANT:** If you use Elasticsearch 8.12, you must change `inference_id` in the snippet above to `model_id`! "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0346151d",
   "metadata": {},
   "source": [
    "#"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1024d070",
   "metadata": {},
   "source": [
    "## Create an ingest pipeline with an inference processor\n",
    "\n",
    "Create an ingest pipeline with an inference processor by using the [`put_pipeline`](https://www.elastic.co/guide/en/elasticsearch/reference/master/put-pipeline-api.html) method. Reference the `inference_id` created above as `model_id` to infer on the data that is being ingested by the pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "6ace9e2e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ObjectApiResponse({'acknowledged': True})"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.ingest.put_pipeline(\n",
    "    id=\"hf_pipeline\",\n",
    "    processors=[\n",
    "        {\n",
    "            \"inference\": {\n",
    "                \"model_id\": \"my_hf_endpoint_object\",\n",
    "                \"input_output\": {\n",
    "                    \"input_field\": \"text_field\",\n",
    "                    \"output_field\": \"text_embedding\",\n",
    "                },\n",
    "            }\n",
    "        }\n",
    "    ],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76d07567",
   "metadata": {},
   "source": [
    "Let's note a few important parameters from that API call:\n",
    "\n",
    "- `inference`: A processor that performs inference using a machine learning model.\n",
    "- `model_id`: Specifies the ID of the inference endpoint to be used. In this example, the inference ID is set to `my_hf_endpoint_object`. Use the inference ID you defined when created the inference task.\n",
    "- `input_output`: Specifies input and output fields.\n",
    "- `input_field`: Field name from which the `dense_vector` representation is created.\n",
    "- `output_field`:  Field name which contains inference results. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28e12d7a",
   "metadata": {},
   "source": [
    "## Create index\n",
    "\n",
    "The mapping of the destination index - the index that contains the embeddings that the model will create based on your input text - must be created. The destination index must have a field with the [dense_vector](https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html) field type to index the output of the model we deployed in Hugging Face (`multilingual-e5-small`).\n",
    "\n",
    "Let's create an index named `hf-endpoint-index` with the mappings we need."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "6ddcbca3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'hf-endpoint-index'})"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.indices.create(\n",
    "    index=\"hf-endpoint-index\",\n",
    "    settings={\n",
    "        \"index\": {\n",
    "            \"default_pipeline\": \"hf_pipeline\",\n",
    "        }\n",
    "    },\n",
    "    mappings={\n",
    "        \"properties\": {\n",
    "            \"text\": {\"type\": \"text\"},\n",
    "            \"text_embedding\": {\n",
    "                \"type\": \"dense_vector\",\n",
    "                \"dims\": 384,\n",
    "                \"similarity\": \"dot_product\",\n",
    "            },\n",
    "        }\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12aa3452",
   "metadata": {},
   "source": [
    "## If you are using Elasticsearch serverless or v8.15+ then you will have access to the new `semantic_text` field\n",
    "`semantic_text` has significantly faster ingest times and is recommended.\n",
    "\n",
    "https://github.com/elastic/elasticsearch/blob/main/docs/reference/mapping/types/semantic-text.asciidoc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "5eeb3755",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'hf-semantic-text-index'})"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.indices.create(\n",
    "    index=\"hf-semantic-text-index\",\n",
    "    mappings={\n",
    "        \"properties\": {\n",
    "            \"infer_field\": {\n",
    "                \"type\": \"semantic_text\",\n",
    "                \"inference_id\": \"my_hf_endpoint_object\",\n",
    "            },\n",
    "            \"text_field\": {\"type\": \"text\", \"copy_to\": \"infer_field\"},\n",
    "        }\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07c187a9",
   "metadata": {},
   "source": [
    "## Insert Documents\n",
    "\n",
    "In this example, we want to show the power of using GPUs in Hugging Face's Inference Endpoint service by indexing millions of multilingual documents from the miracl corpus. The speed at which these documents ingest will depend on whether you use a semantic text field (faster) or an ingest pipeline (slower) and will also depend on how much hardware your rent for your Hugging Face inference endpoint. Using a semantic_text field with a single T4 GPU, it may take about 3 hours to index 1 million documents. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "d68737cb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ca38aa5c44824945a744c6c3a75756ce",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "miracl-corpus.py:   0%|          | 0.00/3.15k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2e8b8c5ce485492ab80cbd34df4b1e39",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "README.md:   0%|          | 0.00/6.85k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "90a4a2866aca4834ac3561be8f2815ac",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading dataset shards:   0%|          | 0/28 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "langs = [\n",
    "    \"ar\",\n",
    "    \"bn\",\n",
    "    \"en\",\n",
    "    \"es\",\n",
    "    \"fa\",\n",
    "    \"fi\",\n",
    "    \"fr\",\n",
    "    \"hi\",\n",
    "    \"id\",\n",
    "    \"ja\",\n",
    "    \"ko\",\n",
    "    \"ru\",\n",
    "    \"sw\",\n",
    "    \"te\",\n",
    "    \"th\",\n",
    "    \"zh\",\n",
    "]\n",
    "\n",
    "\n",
    "all_langs_datasets = [\n",
    "    iter(datasets.load_dataset(\"miracl/miracl-corpus\", lang)[\"train\"]) for lang in langs\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "7ccf9835",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Docs uplaoded: 1000\n",
      "Docs uplaoded: 2000\n"
     ]
    },
    {
     "ename": "KeyboardInterrupt",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
      "Cell \u001b[0;32mIn[11], line 27\u001b[0m\n\u001b[1;32m     17\u001b[0m             \u001b[38;5;66;03m# if you are using an ingest pipeline instead of a\u001b[39;00m\n\u001b[1;32m     18\u001b[0m             \u001b[38;5;66;03m# semantic text field, use this instead:\u001b[39;00m\n\u001b[1;32m     19\u001b[0m             \u001b[38;5;66;03m# documents.append(\u001b[39;00m\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     23\u001b[0m             \u001b[38;5;66;03m#     }\u001b[39;00m\n\u001b[1;32m     24\u001b[0m             \u001b[38;5;66;03m# )\u001b[39;00m\n\u001b[1;32m     26\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m---> 27\u001b[0m     response \u001b[38;5;241m=\u001b[39m \u001b[43mhelpers\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mbulk\u001b[49m\u001b[43m(\u001b[49m\u001b[43mclient\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdocuments\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mraise_on_error\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtimeout\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m60s\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m     28\u001b[0m     \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mDocs uplaoded:\u001b[39m\u001b[38;5;124m\"\u001b[39m, (j \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m) \u001b[38;5;241m*\u001b[39m MAX_BULK_SIZE)\n\u001b[1;32m     30\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n",
      "File \u001b[0;32m~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elasticsearch/helpers/actions.py:531\u001b[0m, in \u001b[0;36mbulk\u001b[0;34m(client, actions, stats_only, ignore_status, *args, **kwargs)\u001b[0m\n\u001b[1;32m    529\u001b[0m \u001b[38;5;66;03m# make streaming_bulk yield successful results so we can count them\u001b[39;00m\n\u001b[1;32m    530\u001b[0m kwargs[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124myield_ok\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mTrue\u001b[39;00m\n\u001b[0;32m--> 531\u001b[0m \u001b[43m\u001b[49m\u001b[38;5;28;43;01mfor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mok\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mitem\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01min\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mstreaming_bulk\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    532\u001b[0m \u001b[43m    \u001b[49m\u001b[43mclient\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mactions\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mignore_status\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mignore_status\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mspan_name\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mhelpers.bulk\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m  \u001b[49m\u001b[38;5;66;43;03m# type: ignore[misc]\u001b[39;49;00m\n\u001b[1;32m    533\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\u001b[43m:\u001b[49m\n\u001b[1;32m    534\u001b[0m \u001b[43m    \u001b[49m\u001b[38;5;66;43;03m# go through request-response pairs and detect failures\u001b[39;49;00m\n\u001b[1;32m    535\u001b[0m \u001b[43m    \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;129;43;01mnot\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mok\u001b[49m\u001b[43m:\u001b[49m\n\u001b[1;32m    536\u001b[0m \u001b[43m        \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;129;43;01mnot\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mstats_only\u001b[49m\u001b[43m:\u001b[49m\n",
      "File \u001b[0;32m~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elasticsearch/helpers/actions.py:445\u001b[0m, in \u001b[0;36mstreaming_bulk\u001b[0;34m(client, actions, chunk_size, max_chunk_bytes, raise_on_error, expand_action_callback, raise_on_exception, max_retries, initial_backoff, max_backoff, yield_ok, ignore_status, span_name, *args, **kwargs)\u001b[0m\n\u001b[1;32m    442\u001b[0m     time\u001b[38;5;241m.\u001b[39msleep(\u001b[38;5;28mmin\u001b[39m(max_backoff, initial_backoff \u001b[38;5;241m*\u001b[39m \u001b[38;5;241m2\u001b[39m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39m (attempt \u001b[38;5;241m-\u001b[39m \u001b[38;5;241m1\u001b[39m)))\n\u001b[1;32m    444\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 445\u001b[0m \u001b[43m    \u001b[49m\u001b[38;5;28;43;01mfor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mdata\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m(\u001b[49m\u001b[43mok\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43minfo\u001b[49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01min\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43mzip\u001b[39;49m\u001b[43m(\u001b[49m\n\u001b[1;32m    446\u001b[0m \u001b[43m        \u001b[49m\u001b[43mbulk_data\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    447\u001b[0m \u001b[43m        \u001b[49m\u001b[43m_process_bulk_chunk\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    448\u001b[0m \u001b[43m            \u001b[49m\u001b[43mclient\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    449\u001b[0m \u001b[43m            \u001b[49m\u001b[43mbulk_actions\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    450\u001b[0m \u001b[43m            \u001b[49m\u001b[43mbulk_data\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    451\u001b[0m \u001b[43m            \u001b[49m\u001b[43motel_span\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    452\u001b[0m \u001b[43m            \u001b[49m\u001b[43mraise_on_exception\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    453\u001b[0m \u001b[43m            \u001b[49m\u001b[43mraise_on_error\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    454\u001b[0m \u001b[43m            \u001b[49m\u001b[43mignore_status\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    455\u001b[0m \u001b[43m            \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    456\u001b[0m \u001b[43m            \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    457\u001b[0m \u001b[43m        \u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    458\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\u001b[43m:\u001b[49m\n\u001b[1;32m    459\u001b[0m \u001b[43m        \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;129;43;01mnot\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mok\u001b[49m\u001b[43m:\u001b[49m\n\u001b[1;32m    460\u001b[0m \u001b[43m            \u001b[49m\u001b[43maction\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43minfo\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m \u001b[49m\u001b[43minfo\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpopitem\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elasticsearch/helpers/actions.py:343\u001b[0m, in \u001b[0;36m_process_bulk_chunk\u001b[0;34m(client, bulk_actions, bulk_data, otel_span, raise_on_exception, raise_on_error, ignore_status, *args, **kwargs)\u001b[0m\n\u001b[1;32m    339\u001b[0m     ignore_status \u001b[38;5;241m=\u001b[39m (ignore_status,)\n\u001b[1;32m    341\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m    342\u001b[0m     \u001b[38;5;66;03m# send the actual request\u001b[39;00m\n\u001b[0;32m--> 343\u001b[0m     resp \u001b[38;5;241m=\u001b[39m \u001b[43mclient\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mbulk\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43moperations\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbulk_actions\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m  \u001b[38;5;66;03m# type: ignore[arg-type]\u001b[39;00m\n\u001b[1;32m    344\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m ApiError \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m    345\u001b[0m     gen \u001b[38;5;241m=\u001b[39m _process_bulk_chunk_error(\n\u001b[1;32m    346\u001b[0m         error\u001b[38;5;241m=\u001b[39me,\n\u001b[1;32m    347\u001b[0m         bulk_data\u001b[38;5;241m=\u001b[39mbulk_data,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    350\u001b[0m         raise_on_error\u001b[38;5;241m=\u001b[39mraise_on_error,\n\u001b[1;32m    351\u001b[0m     )\n",
      "File \u001b[0;32m~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elasticsearch/_sync/client/utils.py:446\u001b[0m, in \u001b[0;36m_rewrite_parameters.<locals>.wrapper.<locals>.wrapped\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m    443\u001b[0m         \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m:\n\u001b[1;32m    444\u001b[0m             \u001b[38;5;28;01mpass\u001b[39;00m\n\u001b[0;32m--> 446\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mapi\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elasticsearch/_sync/client/__init__.py:717\u001b[0m, in \u001b[0;36mElasticsearch.bulk\u001b[0;34m(self, operations, body, index, error_trace, filter_path, human, pipeline, pretty, refresh, require_alias, routing, source, source_excludes, source_includes, timeout, wait_for_active_shards)\u001b[0m\n\u001b[1;32m    712\u001b[0m __body \u001b[38;5;241m=\u001b[39m operations \u001b[38;5;28;01mif\u001b[39;00m operations \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;28;01melse\u001b[39;00m body\n\u001b[1;32m    713\u001b[0m __headers \u001b[38;5;241m=\u001b[39m {\n\u001b[1;32m    714\u001b[0m     \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124maccept\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mapplication/json\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m    715\u001b[0m     \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcontent-type\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mapplication/x-ndjson\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m    716\u001b[0m }\n\u001b[0;32m--> 717\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mperform_request\u001b[49m\u001b[43m(\u001b[49m\u001b[43m  \u001b[49m\u001b[38;5;66;43;03m# type: ignore[return-value]\u001b[39;49;00m\n\u001b[1;32m    718\u001b[0m \u001b[43m    \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mPUT\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m    719\u001b[0m \u001b[43m    \u001b[49m\u001b[43m__path\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    720\u001b[0m \u001b[43m    \u001b[49m\u001b[43mparams\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m__query\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    721\u001b[0m \u001b[43m    \u001b[49m\u001b[43mheaders\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m__headers\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    722\u001b[0m \u001b[43m    \u001b[49m\u001b[43mbody\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m__body\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    723\u001b[0m \u001b[43m    \u001b[49m\u001b[43mendpoint_id\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mbulk\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m    724\u001b[0m \u001b[43m    \u001b[49m\u001b[43mpath_parts\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m__path_parts\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    725\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elasticsearch/_sync/client/_base.py:271\u001b[0m, in \u001b[0;36mBaseClient.perform_request\u001b[0;34m(self, method, path, params, headers, body, endpoint_id, path_parts)\u001b[0m\n\u001b[1;32m    255\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mperform_request\u001b[39m(\n\u001b[1;32m    256\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m    257\u001b[0m     method: \u001b[38;5;28mstr\u001b[39m,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    264\u001b[0m     path_parts: Optional[Mapping[\u001b[38;5;28mstr\u001b[39m, Any]] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m    265\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m ApiResponse[Any]:\n\u001b[1;32m    266\u001b[0m     \u001b[38;5;28;01mwith\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_otel\u001b[38;5;241m.\u001b[39mspan(\n\u001b[1;32m    267\u001b[0m         method,\n\u001b[1;32m    268\u001b[0m         endpoint_id\u001b[38;5;241m=\u001b[39mendpoint_id,\n\u001b[1;32m    269\u001b[0m         path_parts\u001b[38;5;241m=\u001b[39mpath_parts \u001b[38;5;129;01mor\u001b[39;00m {},\n\u001b[1;32m    270\u001b[0m     ) \u001b[38;5;28;01mas\u001b[39;00m otel_span:\n\u001b[0;32m--> 271\u001b[0m         response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_perform_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    272\u001b[0m \u001b[43m            \u001b[49m\u001b[43mmethod\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    273\u001b[0m \u001b[43m            \u001b[49m\u001b[43mpath\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    274\u001b[0m \u001b[43m            \u001b[49m\u001b[43mparams\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mparams\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    275\u001b[0m \u001b[43m            \u001b[49m\u001b[43mheaders\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mheaders\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    276\u001b[0m \u001b[43m            \u001b[49m\u001b[43mbody\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbody\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    277\u001b[0m \u001b[43m            \u001b[49m\u001b[43motel_span\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43motel_span\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    278\u001b[0m \u001b[43m        \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    279\u001b[0m         otel_span\u001b[38;5;241m.\u001b[39mset_elastic_cloud_metadata(response\u001b[38;5;241m.\u001b[39mmeta\u001b[38;5;241m.\u001b[39mheaders)\n\u001b[1;32m    280\u001b[0m         \u001b[38;5;28;01mreturn\u001b[39;00m response\n",
      "File \u001b[0;32m~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elasticsearch/_sync/client/_base.py:316\u001b[0m, in \u001b[0;36mBaseClient._perform_request\u001b[0;34m(self, method, path, params, headers, body, otel_span)\u001b[0m\n\u001b[1;32m    313\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m    314\u001b[0m     target \u001b[38;5;241m=\u001b[39m path\n\u001b[0;32m--> 316\u001b[0m meta, resp_body \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtransport\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mperform_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    317\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmethod\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    318\u001b[0m \u001b[43m    \u001b[49m\u001b[43mtarget\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    319\u001b[0m \u001b[43m    \u001b[49m\u001b[43mheaders\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrequest_headers\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    320\u001b[0m \u001b[43m    \u001b[49m\u001b[43mbody\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbody\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    321\u001b[0m \u001b[43m    \u001b[49m\u001b[43mrequest_timeout\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_request_timeout\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    322\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmax_retries\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_max_retries\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    323\u001b[0m \u001b[43m    \u001b[49m\u001b[43mretry_on_status\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_retry_on_status\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    324\u001b[0m \u001b[43m    \u001b[49m\u001b[43mretry_on_timeout\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_retry_on_timeout\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    325\u001b[0m \u001b[43m    \u001b[49m\u001b[43mclient_meta\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_client_meta\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    326\u001b[0m \u001b[43m    \u001b[49m\u001b[43motel_span\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43motel_span\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    327\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    329\u001b[0m \u001b[38;5;66;03m# HEAD with a 404 is returned as a normal response\u001b[39;00m\n\u001b[1;32m    330\u001b[0m \u001b[38;5;66;03m# since this is used as an 'exists' functionality.\u001b[39;00m\n\u001b[1;32m    331\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m (method \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mHEAD\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01mand\u001b[39;00m meta\u001b[38;5;241m.\u001b[39mstatus \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m404\u001b[39m) \u001b[38;5;129;01mand\u001b[39;00m (\n\u001b[1;32m    332\u001b[0m     \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;241m200\u001b[39m \u001b[38;5;241m<\u001b[39m\u001b[38;5;241m=\u001b[39m meta\u001b[38;5;241m.\u001b[39mstatus \u001b[38;5;241m<\u001b[39m \u001b[38;5;241m299\u001b[39m\n\u001b[1;32m    333\u001b[0m     \u001b[38;5;129;01mand\u001b[39;00m (\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    337\u001b[0m     )\n\u001b[1;32m    338\u001b[0m ):\n",
      "File \u001b[0;32m~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elastic_transport/_transport.py:342\u001b[0m, in \u001b[0;36mTransport.perform_request\u001b[0;34m(self, method, target, body, headers, max_retries, retry_on_status, retry_on_timeout, request_timeout, client_meta, otel_span)\u001b[0m\n\u001b[1;32m    340\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m    341\u001b[0m     otel_span\u001b[38;5;241m.\u001b[39mset_node_metadata(node\u001b[38;5;241m.\u001b[39mhost, node\u001b[38;5;241m.\u001b[39mport, node\u001b[38;5;241m.\u001b[39mbase_url, target)\n\u001b[0;32m--> 342\u001b[0m     resp \u001b[38;5;241m=\u001b[39m \u001b[43mnode\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mperform_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    343\u001b[0m \u001b[43m        \u001b[49m\u001b[43mmethod\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    344\u001b[0m \u001b[43m        \u001b[49m\u001b[43mtarget\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    345\u001b[0m \u001b[43m        \u001b[49m\u001b[43mbody\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrequest_body\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    346\u001b[0m \u001b[43m        \u001b[49m\u001b[43mheaders\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrequest_headers\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    347\u001b[0m \u001b[43m        \u001b[49m\u001b[43mrequest_timeout\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrequest_timeout\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    348\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    349\u001b[0m     _logger\u001b[38;5;241m.\u001b[39minfo(\n\u001b[1;32m    350\u001b[0m         \u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m \u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m [status:\u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m duration:\u001b[39m\u001b[38;5;132;01m%.3f\u001b[39;00m\u001b[38;5;124ms]\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m    351\u001b[0m         \u001b[38;5;241m%\u001b[39m (\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    357\u001b[0m         )\n\u001b[1;32m    358\u001b[0m     )\n\u001b[1;32m    360\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m method \u001b[38;5;241m!=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mHEAD\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n",
      "File \u001b[0;32m~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elastic_transport/_node/_http_urllib3.py:167\u001b[0m, in \u001b[0;36mUrllib3HttpNode.perform_request\u001b[0;34m(self, method, target, body, headers, request_timeout)\u001b[0m\n\u001b[1;32m    164\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m    165\u001b[0m     body_to_send \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m--> 167\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpool\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43murlopen\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    168\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmethod\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    169\u001b[0m \u001b[43m    \u001b[49m\u001b[43mtarget\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    170\u001b[0m \u001b[43m    \u001b[49m\u001b[43mbody\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbody_to_send\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    171\u001b[0m \u001b[43m    \u001b[49m\u001b[43mretries\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mRetry\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    172\u001b[0m \u001b[43m    \u001b[49m\u001b[43mheaders\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrequest_headers\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    173\u001b[0m \u001b[43m    \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkw\u001b[49m\u001b[43m,\u001b[49m\u001b[43m  \u001b[49m\u001b[38;5;66;43;03m# type: ignore[arg-type]\u001b[39;49;00m\n\u001b[1;32m    174\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    175\u001b[0m response_headers \u001b[38;5;241m=\u001b[39m HttpHeaders(response\u001b[38;5;241m.\u001b[39mheaders)\n\u001b[1;32m    176\u001b[0m data \u001b[38;5;241m=\u001b[39m response\u001b[38;5;241m.\u001b[39mdata\n",
      "File \u001b[0;32m~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py:789\u001b[0m, in \u001b[0;36mHTTPConnectionPool.urlopen\u001b[0;34m(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)\u001b[0m\n\u001b[1;32m    786\u001b[0m response_conn \u001b[38;5;241m=\u001b[39m conn \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m release_conn \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m    788\u001b[0m \u001b[38;5;66;03m# Make the request on the HTTPConnection object\u001b[39;00m\n\u001b[0;32m--> 789\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_make_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    790\u001b[0m \u001b[43m    \u001b[49m\u001b[43mconn\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    791\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmethod\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    792\u001b[0m \u001b[43m    \u001b[49m\u001b[43murl\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    793\u001b[0m \u001b[43m    \u001b[49m\u001b[43mtimeout\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtimeout_obj\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    794\u001b[0m \u001b[43m    \u001b[49m\u001b[43mbody\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbody\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    795\u001b[0m \u001b[43m    \u001b[49m\u001b[43mheaders\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mheaders\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    796\u001b[0m \u001b[43m    \u001b[49m\u001b[43mchunked\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mchunked\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    797\u001b[0m \u001b[43m    \u001b[49m\u001b[43mretries\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mretries\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    798\u001b[0m \u001b[43m    \u001b[49m\u001b[43mresponse_conn\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mresponse_conn\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    799\u001b[0m \u001b[43m    \u001b[49m\u001b[43mpreload_content\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpreload_content\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    800\u001b[0m \u001b[43m    \u001b[49m\u001b[43mdecode_content\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdecode_content\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    801\u001b[0m \u001b[43m    \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mresponse_kw\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    802\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    804\u001b[0m \u001b[38;5;66;03m# Everything went great!\u001b[39;00m\n\u001b[1;32m    805\u001b[0m clean_exit \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mTrue\u001b[39;00m\n",
      "File \u001b[0;32m~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py:536\u001b[0m, in \u001b[0;36mHTTPConnectionPool._make_request\u001b[0;34m(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)\u001b[0m\n\u001b[1;32m    534\u001b[0m \u001b[38;5;66;03m# Receive the response from the server\u001b[39;00m\n\u001b[1;32m    535\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 536\u001b[0m     response \u001b[38;5;241m=\u001b[39m \u001b[43mconn\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgetresponse\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    537\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m (BaseSSLError, \u001b[38;5;167;01mOSError\u001b[39;00m) \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m    538\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_raise_timeout(err\u001b[38;5;241m=\u001b[39me, url\u001b[38;5;241m=\u001b[39murl, timeout_value\u001b[38;5;241m=\u001b[39mread_timeout)\n",
      "File \u001b[0;32m~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/urllib3/connection.py:464\u001b[0m, in \u001b[0;36mHTTPConnection.getresponse\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    461\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mresponse\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m HTTPResponse\n\u001b[1;32m    463\u001b[0m \u001b[38;5;66;03m# Get the response from http.client.HTTPConnection\u001b[39;00m\n\u001b[0;32m--> 464\u001b[0m httplib_response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgetresponse\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    466\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m    467\u001b[0m     assert_header_parsing(httplib_response\u001b[38;5;241m.\u001b[39mmsg)\n",
      "File \u001b[0;32m~/.pyenv/versions/3.11.4/lib/python3.11/http/client.py:1378\u001b[0m, in \u001b[0;36mHTTPConnection.getresponse\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m   1376\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m   1377\u001b[0m     \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m-> 1378\u001b[0m         \u001b[43mresponse\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mbegin\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1379\u001b[0m     \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mConnectionError\u001b[39;00m:\n\u001b[1;32m   1380\u001b[0m         \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mclose()\n",
      "File \u001b[0;32m~/.pyenv/versions/3.11.4/lib/python3.11/http/client.py:318\u001b[0m, in \u001b[0;36mHTTPResponse.begin\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    316\u001b[0m \u001b[38;5;66;03m# read until we get a non-100 response\u001b[39;00m\n\u001b[1;32m    317\u001b[0m \u001b[38;5;28;01mwhile\u001b[39;00m \u001b[38;5;28;01mTrue\u001b[39;00m:\n\u001b[0;32m--> 318\u001b[0m     version, status, reason \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_read_status\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    319\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m status \u001b[38;5;241m!=\u001b[39m CONTINUE:\n\u001b[1;32m    320\u001b[0m         \u001b[38;5;28;01mbreak\u001b[39;00m\n",
      "File \u001b[0;32m~/.pyenv/versions/3.11.4/lib/python3.11/http/client.py:279\u001b[0m, in \u001b[0;36mHTTPResponse._read_status\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    278\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_read_status\u001b[39m(\u001b[38;5;28mself\u001b[39m):\n\u001b[0;32m--> 279\u001b[0m     line \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mstr\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mfp\u001b[38;5;241m.\u001b[39mreadline(_MAXLINE \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m), \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124miso-8859-1\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m    280\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(line) \u001b[38;5;241m>\u001b[39m _MAXLINE:\n\u001b[1;32m    281\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m LineTooLong(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mstatus line\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
      "File \u001b[0;32m~/.pyenv/versions/3.11.4/lib/python3.11/socket.py:706\u001b[0m, in \u001b[0;36mSocketIO.readinto\u001b[0;34m(self, b)\u001b[0m\n\u001b[1;32m    704\u001b[0m \u001b[38;5;28;01mwhile\u001b[39;00m \u001b[38;5;28;01mTrue\u001b[39;00m:\n\u001b[1;32m    705\u001b[0m     \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 706\u001b[0m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_sock\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrecv_into\u001b[49m\u001b[43m(\u001b[49m\u001b[43mb\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    707\u001b[0m     \u001b[38;5;28;01mexcept\u001b[39;00m timeout:\n\u001b[1;32m    708\u001b[0m         \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_timeout_occurred \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mTrue\u001b[39;00m\n",
      "File \u001b[0;32m~/.pyenv/versions/3.11.4/lib/python3.11/ssl.py:1278\u001b[0m, in \u001b[0;36mSSLSocket.recv_into\u001b[0;34m(self, buffer, nbytes, flags)\u001b[0m\n\u001b[1;32m   1274\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m flags \u001b[38;5;241m!=\u001b[39m \u001b[38;5;241m0\u001b[39m:\n\u001b[1;32m   1275\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m   1276\u001b[0m           \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mnon-zero flags not allowed in calls to recv_into() on \u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;241m%\u001b[39m\n\u001b[1;32m   1277\u001b[0m           \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__class__\u001b[39m)\n\u001b[0;32m-> 1278\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread\u001b[49m\u001b[43m(\u001b[49m\u001b[43mnbytes\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbuffer\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1279\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m   1280\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28msuper\u001b[39m()\u001b[38;5;241m.\u001b[39mrecv_into(buffer, nbytes, flags)\n",
      "File \u001b[0;32m~/.pyenv/versions/3.11.4/lib/python3.11/ssl.py:1134\u001b[0m, in \u001b[0;36mSSLSocket.read\u001b[0;34m(self, len, buffer)\u001b[0m\n\u001b[1;32m   1132\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m   1133\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m buffer \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m-> 1134\u001b[0m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_sslobj\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mlen\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbuffer\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1135\u001b[0m     \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m   1136\u001b[0m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_sslobj\u001b[38;5;241m.\u001b[39mread(\u001b[38;5;28mlen\u001b[39m)\n",
      "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
     ]
    }
   ],
   "source": [
    "MAX_BULK_SIZE = 1000\n",
    "MAX_BULK_UPLOADS = 1000\n",
    "\n",
    "sentinel = object()\n",
    "for j in range(MAX_BULK_UPLOADS):\n",
    "    documents = []\n",
    "    while len(documents) < MAX_BULK_SIZE - len(all_langs_datasets):\n",
    "        for ds in all_langs_datasets:\n",
    "            text = next(ds, sentinel)\n",
    "            if text is not sentinel:\n",
    "                documents.append(\n",
    "                    {\n",
    "                        \"_index\": \"hf-semantic-text-index\",\n",
    "                        \"_source\": {\"text_field\": text[\"text\"]},\n",
    "                    }\n",
    "                )\n",
    "                # if you are using an ingest pipeline instead of a\n",
    "                # semantic text field, use this instead:\n",
    "                # documents.append(\n",
    "                #     {\n",
    "                #         \"_index\": \"hf-endpoint-index\",\n",
    "                #         \"_source\": {\"text\": text['text']},\n",
    "                #     }\n",
    "                # )\n",
    "\n",
    "    try:\n",
    "        response = helpers.bulk(client, documents, raise_on_error=False, timeout=\"60s\")\n",
    "        print(\"Docs uplaoded:\", (j + 1) * MAX_BULK_SIZE)\n",
    "\n",
    "    except Exception as e:\n",
    "        print(\"exception:\", str(e))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf0f6df7",
   "metadata": {},
   "source": [
    "## Semantic search\n",
    "\n",
    "After the dataset has been enriched with the embeddings, you can query the data using [semantic search](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html#knn-semantic-search). Pass a `query_vector_builder` to the k-nearest neighbor (kNN) vector search API, and provide the query text and the model you have used to create the embeddings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "d9b21b71",
   "metadata": {},
   "outputs": [],
   "source": [
    "query = \"English speaking countries\"\n",
    "semantic_search_results = client.search(\n",
    "    index=\"hf-semantic-text-index\",\n",
    "    query={\"semantic\": {\"field\": \"infer_field\", \"query\": query}},\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "8ef79d16",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "ID: DDbC4pEBhYre9Ocn7zIr\n",
      "Score: 0.92574656\n",
      "Text: Orodha ya nchi kufuatana na wakazi\n",
      "\n",
      "ID: bjbC4pEBhYre9OcnzC3U\n",
      "Score: 0.9159906\n",
      "Text: Intercontinental Cup\n",
      "\n",
      "ID: njbC4pEBhYre9OcnzC3U\n",
      "Score: 0.91523564\n",
      "Text: รายการจัดเรียงตามทวีปและประเทศ\n",
      "\n",
      "ID: bDbC4pEBhYre9Ocn3jBM\n",
      "Score: 0.9142189\n",
      "Text: a b c ĉ d e f g ĝ h ĥ i j ĵ k l m n o p r s ŝ t u ŭ v z\n",
      "\n",
      "ID: 8jbD4pEBhYre9OcnDTSL\n",
      "Score: 0.9127883\n",
      "Text: With Australia:\n",
      "With Adelaide United:\n",
      "\n",
      "ID: MzbC4pEBhYre9Ocn_TQ1\n",
      "Score: 0.9116771\n",
      "Text: Más información en .\n",
      "\n",
      "ID: _DbC4pEBhYre9Ocn7zEr\n",
      "Score: 0.9106927\n",
      "Text: (AS)= Asia (AF)= Afrika (NA)= Amerika ya kaskazini (SA)= Amerika ya kusini (A)= Antaktika (EU)= Ulaya na (AU)= Australia na nchi za Pasifiki.\n",
      "\n",
      "ID: fDbC4pEBhYre9Ocn7zEr\n",
      "Score: 0.9096315\n",
      "Text: Stadi za lugha ya mazungumzo ni kuzungumza na kusikiliza.\n",
      "\n",
      "ID: DDbC4pEBhYre9Ocn3jBL\n",
      "Score: 0.90771043\n",
      "Text: \"*(Meksiko mara nyingi huhesabiwa katika Amerika ya Kati kwa sababu za kiutamaduni)\"\n",
      "\n",
      "ID: IjbC4pEBhYre9Ocn3i9L\n",
      "Score: 0.9070151\n",
      "Text: Englan is a small village in the district of Wokha, in the Nagaland state of India. Its name literally means \"The Path of the Sun\". It is one of the main centers of the district and is an active center of the Lotha language and culture.\n"
     ]
    }
   ],
   "source": [
    "pretty_search_response(semantic_search_results)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "a7053b1e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ObjectApiResponse({'inference_id': 'my_cohere_rerank_endpoint', 'task_type': 'rerank', 'service': 'cohere', 'service_settings': {'model_id': 'rerank-english-v3.0', 'rate_limit': {'requests_per_minute': 10000}}, 'task_settings': {'top_n': 100, 'return_documents': True}})"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "try:\n",
    "    client.inference.delete(inference_id=\"my_cohere_rerank_endpoint\")\n",
    "except Exception:\n",
    "    pass\n",
    "client.inference.put(\n",
    "    task_type=\"rerank\",\n",
    "    inference_id=\"my_cohere_rerank_endpoint\",\n",
    "    body={\n",
    "        \"service\": \"cohere\",\n",
    "        \"service_settings\": {\n",
    "            \"api_key\": \"<COHERE-API-KEY>\",\n",
    "            \"model_id\": \"rerank-english-v3.0\",\n",
    "        },\n",
    "        \"task_settings\": {\"top_n\": 100, \"return_documents\": True},\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "6fba4f22",
   "metadata": {},
   "outputs": [],
   "source": [
    "reranked_search_results = client.search(\n",
    "    index=\"hf-semantic-text-index\",\n",
    "    retriever={\n",
    "        \"text_similarity_reranker\": {\n",
    "            \"retriever\": {\n",
    "                \"standard\": {\n",
    "                    \"query\": {\"semantic\": {\"field\": \"infer_field\", \"query\": query}}\n",
    "                }\n",
    "            },\n",
    "            \"field\": \"text_field\",\n",
    "            \"inference_id\": \"my_cohere_rerank_endpoint\",\n",
    "            \"inference_text\": query,\n",
    "            \"rank_window_size\": 100,\n",
    "        }\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "fd4ef932",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "ID: _DbC4pEBhYre9Ocn7zEr\n",
      "Score: 0.1766716\n",
      "Text: (AS)= Asia (AF)= Afrika (NA)= Amerika ya kaskazini (SA)= Amerika ya kusini (A)= Antaktika (EU)= Ulaya na (AU)= Australia na nchi za Pasifiki.\n",
      "\n",
      "ID: zDbC4pEBhYre9OcnzC7V\n",
      "Score: 0.06394842\n",
      "Text: Waingereza nao wakatawala Afrika Mashariki na Kusini, na kuwa sehemu ya Sudan na Somalia, Uganda, Kenya, Tanzania (chini ya jina la Tanganyika), Zanzibar, Nyasaland, Rhodesia, Bechuanaland, Basutoland na Swaziland chini ya utawala wao na baada ya kushinda katika vita huko Afrika ya Kusini walitawala Transvaal, Orange Free State, Cape Colony na Natal, na huko Afrika ya Magharibi walitawala Gambia, Sierra Leone, the Gold Coast na Nigeria.\n",
      "\n",
      "ID: bDbC4pEBhYre9Ocn3jBM\n",
      "Score: 0.013532149\n",
      "Text: a b c ĉ d e f g ĝ h ĥ i j ĵ k l m n o p r s ŝ t u ŭ v z\n",
      "\n",
      "ID: LDbD4pEBhYre9OcnHje5\n",
      "Score: 0.010130412\n",
      "Text: Mifano maarufu ya bunge ni Majumba ya Bunge mjini London, Kongresi mjini Washingtin D.C., Bundestag mjini Berlin na Duma nchini Moscow, Parlamento Italiano mjini Roma na \"Assemblée nationale\" mjini Paris. Kwa kanuni ya serikali wakilishi watu hupigia kura wanasiasa ili watimize \"matakwa\" yao. Ingawa nchi kama Israeli, Ugiriki, Uswidi na Uchina zina nyumba moja ya bunge, nchi nyingi zina nyumba mbili za bunge, kumaanisha kuwa zina nyumba mbili za kibunge zinazochaguliwa tofauti. Katika 'nyumba ya chini' wanasiasa wanachaguliwa kuwakilisha maeneo wakilishi bungeni. 'Nymba ya juu' kawaida huchaguliwa kuwakilisha majimbo katika mfumo wa majimbo (kama vile nchii Australia, Ujerumani au Marekani) au upigaji kura tofauti katika katika mfumo wa umoja (kama vile nchini Ufaransa). Nchini Uingereza nyumba ya juu inachaguliwa na na serikali kama nyumba ya marudio. Ukosoaji mmoja wa mifumo yenye nyumba mbili yenye nyumba mbili zilizochaguliwa ni kuwa nyumba ya juu na ya chini huenda zikafanana. Utetezi wa tangu jadi wa mifumo ya nyumba mbili nni kuwa chumba cha juu huwa kama nyumba ya marekebisho. Hili linaweza kupunguza uonevu na dhuluma katika hatua ya kiserikali\", 101\n",
      "\n",
      "ID: lzbC4pEBhYre9Ocn7zIr\n",
      "Score: 0.0033897832\n",
      "Text: इसके अलावा हिन्दी और संस्कृत में\n",
      "\n",
      "ID: wDbC4pEBhYre9Ocn7zIr\n",
      "Score: 0.0025311112\n",
      "Text: 2. التزام بريطانيا وفرنسا وفيما بعد إيطاليا بإدارة دولية لفلسطين.\n",
      "\n",
      "ID: IjbC4pEBhYre9Ocn3i9L\n",
      "Score: 0.0023596606\n",
      "Text: Englan is a small village in the district of Wokha, in the Nagaland state of India. Its name literally means \"The Path of the Sun\". It is one of the main centers of the district and is an active center of the Lotha language and culture.\n",
      "\n",
      "ID: jTbD4pEBhYre9OcnDTWL\n",
      "Score: 0.0022694687\n",
      "Text: ఇండియా గేటు\n",
      "\n",
      "ID: 4zbC4pEBhYre9Ocn_TM0\n",
      "Score: 0.0018458483\n",
      "Text: Más información en la web de la Generalidad Valenciana o en la web de la FEDME\n",
      "\n",
      "ID: 8jbD4pEBhYre9OcnDTSL\n",
      "Score: 0.0016875096\n",
      "Text: With Australia:\n",
      "With Adelaide United:\n"
     ]
    }
   ],
   "source": [
    "pretty_search_response(reranked_search_results)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7e4055ba",
   "metadata": {},
   "source": [
    "**NOTE:** The value of `model_id` in the `query_vector_builder` must match the value of `inference_id` you created in the [first step](#create-the-inference-endpoint)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
